How I improved the estimate of how big a failure rate could be by 10 percentage points

Summary

My answer (the real answer): With 95% confidence, the failure rate is < 39%.

Gen AI / Stats 101 “answer”: the failure rate is \(10\% \pm 19\%\) with 95% confidence.

My value-added improvement: 10 percentage points (39% vs 29%).

The full scenario

If my cubicle in my old job was a statistical shop, a not infrequent situation went something like this:

An engineer would walk in and say something like: “Our lab tested ten units and one failed. What can I conclude about the failure rate?”

After the usual pleasantries, I’d ask about the tests to see if they were independent and identical (they usually were) and how confident s/he’d like to be in the answer (90%? 99%?).

I’d run some binomial computations and give an answer like: “You can be 90% confident that the failure rate is less than 34%, and 95% confident that it’s less than 39%”. (For good measure, I’d throw in a reminder about the assumptions, and maybe a note about multiple confidence assertions.)

So how did I improve the estimate of how big the failure rate could be by 10 percentage points?

If instead of walking into my cubicle, the engineer had consulted AI or a Stats 101 website, they might well have gotten an answer that was some combination of “the failure rate is \(10\% \pm 19\%\)” or “you can’t conclude anything because \(np < 5\)”, based on the normal approximation.

(When asked, Perplexity both gave the formula for the normally-approximated confidence interval and said “the sample size is too small for statistical significance”.)

So, my binomial computations prevented the engineer from getting a false sense of security about how bad the failure rate could be. And the difference between the two is \(39\% - 29\% = 10\) percentage points.

The math

Given a confidence level \(L\) (e.g., \(0.95\)) and \(X \sim \text{Bin}(n, p)\), where \(p\) denotes the unknown failure rate, we want the smallest \(b \in [0, 1]\) for which \(p \leq b\) with \(L \times 100\%\) confidence.

Why? Because the hypothesis test \(\{H_0: p \geq b, H_a: p < b\}\) rejects \(H_0\) when $$P(X \leq k \mid X \sim \text{Bin}(n, b)) < 1 - L.$$

Consider the function \(f: [0,1] \to [0,1]\) defined by $$f(x) := P(X \leq k \mid X \sim \text{Bin}(n, x))\.$$ This function is non-increasing, with \(f(0) = 1\) and \(f(1) = \begin{cases} 0, & k < n \\ 1, & k = n \end{cases}\)

The rejection region for the hypothesis test is \((b, 1]\) where \(f(b) = 1 - L\).

Here’s an illustration with \(k=1\) and \(L=0.95\):

Binomial(1,10,x) cdf w upper 95 confidence bound

If \(k = n\), then \(b = 1\). If \(k < n\), \(b\) is the solution to $$P(X \leq k \mid X \sim \text{Bin}(n, b)) = 1 - L$$ for all intents and purposes. (I say "for all intents and purposes" because there is no smallest \(b\). The solution to \(P(X \leq k \mid X \sim \text{Bin}(n, b)) = 1 - L\) has the p-value \(1 - L\), so isn't really in the rejection region. But \(b + \epsilon\) is for all \(\epsilon > 0\).)

What other situations this solution applies to